Chapter 9
Summarizing and Graphing Your Data
IN THIS CHAPTER
Representing categorical data
Characterizing numerical variables
Putting numerical summaries into tables
Displaying numerical variables with bars and graphs
A large study can involve thousands of participants, hundreds of variables, and millions of individual
data points. You need to summarize this ocean of individual values for each variable down to a few
numbers, called summary statistics, that give readers an idea of what the whole collection of numbers
looks like — that is, how they’re distributed.
When presenting your results, you usually want to arrange these summary statistics into tables that
describe how the variables change over time or differ between categories, or how two or more
variables are related to each other. And, because a picture really is worth a thousand words, you will
want to display these distributions, changes, differences, and relationships graphically. In this chapter,
we show you how to summarize and graph both categorical and numerical data. Note: This chapter
doesn’t cover time-to-event (survival) data, which is the topic of Chapter 22.
Summarizing and Graphing Categorical Data
A categorical variable is summarized by tallying the number of participants in each category and
expressing this number as a count. You might also compute a percentage of the total number of
participants in all categories combined. So a sample of 422 participants can be summarized by health
insurance type, as shown in Table 9-1.
TABLE 9-1 Study Participants Categorized by Health Insurance Type
Health Insurance Type Count Percent of Total
Commercial
128
30.3%
Public
141
33.4%
Military
70
16.6%
Other
83
19.7%
Total
422
100%
The joint distribution of participants between two categorical variables is summarized by a cross-
tabulation (or cross-tab). Table 9-2 shows an example of a cross-tab of the same participants in our
example with type of health insurance on one axis, and urban-rural classification of their residence on
the other.